Weakly Aligned Feature Fusion for Multimodal Object Detection
نویسندگان
چکیده
To achieve accurate and robust object detection in the real-world scenario, various forms of images are incorporated, such as color, thermal, depth. However, multimodal data often suffer from position shift problem, i.e., image pair is not strictly aligned, making one has different positions modalities. For deep learning method, this problem makes it difficult to fuse features puzzles convolutional neural network (CNN) training. In article, we propose a general detector named aligned region CNN (AR-CNN) tackle problem. First, feature (RF) alignment module with adjacent similarity constraint designed consistently predict between two modalities adaptively align cross-modal RFs. Second, novel interest (RoI) jitter strategy improve robustness unexpected patterns. Third, present new fusion method that selects more reliable suppresses less useful via reweighting. addition, by locating bounding boxes both building their relationships, provide labeling KAIST-Paired. Extensive experiments on 2-D 3-D detection, RGB-T, RGB-D datasets demonstrate effectiveness our method.
منابع مشابه
Self-Attentive Feature-level Fusion for Multimodal Emotion Detection
Multimodal emotion recognition is the task of detecting emotions present in user-generated multimedia content. Such resources contain complementary information in multiple modalities. A stiff challenge often faced is the complexity associated with feature-level fusion of these heterogeneous modes. In this paper, we propose a new feature-level fusion method based on self-attention mechanism. We ...
متن کاملMulti Sensor Fusion for Object Detection Using Generalized Feature Models
This paper presents a multi sensor tracking system and introduces the use of new generalized feature models. To detect and recognize objects as selfcontained parts of the real world with two or more sensors of the same or of several types requires on the one hand fusion methods suitable for combining the data coming from the set of sensors in an optimal manner. This is realized by a sensor fusi...
متن کاملFeature-Level based Video Fusion for Object Detection
Fusion of three-dimensional data from multiple sensors gained momentum, especially in applications pertaining to surveillance, when promising results were obtained in moving object detection. Several approaches to video fusion of visual and infrared data have been proposed in recent literature. They mainly comprise of pixel based methodologies. Surveillance is a major application of video fusio...
متن کاملObject-centered Feature Selection for Weakly-Unsupervised Object Categorization
We describe a novel approach of spatio-temporal mapping of local image features, to reduce the number of input data for further object categorization. The main focus of our work is the selection of good features to learn, by achieving a precise mapping of image features either related to static objects or to background. This can be done by initial camera motion estimation, subsequent structure ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE transactions on neural networks and learning systems
سال: 2021
ISSN: ['2162-237X', '2162-2388']
DOI: https://doi.org/10.1109/tnnls.2021.3105143